Variant Discovery ◾ 125
bcftools filter -O z \
-o filtered2_sarscov2.vcf.gz \
-i ‘DP>300’ filtered_sarscov2.vcf.gz
You can open the filtered VCF file to notice the changes and that the filter command will
be added to the VCF header.
Usually, you can implement different filters on the variants in a VCF file to achieve accu-
rate and reliable results.
Another way to filter variants is to use “bcftools isec” with truth variants in a VCF file
as input together with your raw VCF file to create intersections, unions, and complements
of the VCF files.
bcftools isec -c both -p isec truth.vcf.gz input.vcf.gz
Refer to bcftools help for more details.
4.2.2 Haplotype-Based Variant Callers
The haplotype-based variant calling programs usually use Bayesian probabilistic model
to predict variants on aligned reads based on a haplotype structure of variants rather
than only sequence alignment. The haplotype is a set of genetic variants that are inherited
together. The haplotype-based variant detection depends on the physical phasing, which
is the process of inferring haplotype structure based on genotypic data using the Bayesian
approach. The prediction is based on relating the probability of a specific genotype given a
set of reads to the likelihood of sequencing errors in the reads and the prior likelihood of
specific genotypes. The Bayesian haplotype-based approach allows modeling multiallelic
loci that enables direct detection of a longer, multi-base alleles from sequence alignment.
Using reads aligned to a reference sequence (BAM) as input, haplotype-based algorithm
first attempts to identify the active regions of variations on the reads aligned to the refer-
ence genome. The identification of the active regions is carried out with dynamic sliding
windows of certain size along the reference sequence. The number of events (mismatches,
InDels, and soft clips) is counted in each window. When the number of events in that
FIGURE 4.5 VCF file containing filtered variants.